Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells428431
Missing cells (%)8.0%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 84 (18.8%) missing values Age has 87 (19.5%) missing values Missing
Cabin has 343 (76.9%) missing values Cabin has 344 (77.1%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 306 (68.6%) zeros SibSp has 293 (65.7%) zeros Zeros
Parch has 337 (75.6%) zeros Parch has 339 (76.0%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 7 (1.6%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-03-18 18:33:39.3005342024-03-18 18:33:43.302154
Analysis finished2024-03-18 18:33:43.3010502024-03-18 18:33:46.300350
Duration4 seconds3 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean429.80717435.92152
 Dataset ADataset B
Minimum12
Maximum891887
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:46.437837image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum12
5-th percentile42.547.25
Q1208.25217.5
median413432.5
Q3648.75667.25
95-th percentile841.75830
Maximum891887
Range890885
Interquartile range (IQR)440.5449.75

Descriptive statistics

 Dataset ADataset B
Standard deviation256.13708256.90053
Coefficient of variation (CV)0.595934870.58932748
Kurtosis-1.1850064-1.2449463
Mean429.80717435.92152
Median Absolute Deviation (MAD)221222.5
Skewness0.11542450.032879239
Sum191694194421
Variance65606.20565997.884
MonotonicityNot monotonicNot monotonic
2024-03-18T18:33:46.763647image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
278 1
 
0.2%
187 1
 
0.2%
1 1
 
0.2%
331 1
 
0.2%
833 1
 
0.2%
103 1
 
0.2%
448 1
 
0.2%
220 1
 
0.2%
784 1
 
0.2%
641 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
51 1
 
0.2%
823 1
 
0.2%
875 1
 
0.2%
365 1
 
0.2%
797 1
 
0.2%
110 1
 
0.2%
564 1
 
0.2%
293 1
 
0.2%
521 1
 
0.2%
411 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
7 1
0.2%
9 1
0.2%
11 1
0.2%
13 1
0.2%
16 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
7 1
0.2%
9 1
0.2%
11 1
0.2%
13 1
0.2%
16 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
282 
1
164 
0
268 
1
178 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row01
3rd row11
4th row01
5th row01

Common Values

ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Length

2024-03-18T18:33:46.914967image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-18T18:33:47.023773image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:47.125195image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring characters

ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
243 
1
110 
2
93 
3
245 
1
105 
2
96 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row23
2nd row23
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
3 243
54.5%
1 110
24.7%
2 93
 
20.9%
ValueCountFrequency (%)
3 245
54.9%
1 105
23.5%
2 96
 
21.5%

Length

2024-03-18T18:33:47.236039image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-18T18:33:47.346249image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:47.458973image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
3 243
54.5%
1 110
24.7%
2 93
 
20.9%
ValueCountFrequency (%)
3 245
54.9%
1 105
23.5%
2 96
 
21.5%

Most occurring characters

ValueCountFrequency (%)
3 243
54.5%
1 110
24.7%
2 93
 
20.9%
ValueCountFrequency (%)
3 245
54.9%
1 105
23.5%
2 96
 
21.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 243
54.5%
1 110
24.7%
2 93
 
20.9%
ValueCountFrequency (%)
3 245
54.9%
1 105
23.5%
2 96
 
21.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 243
54.5%
1 110
24.7%
2 93
 
20.9%
ValueCountFrequency (%)
3 245
54.9%
1 105
23.5%
2 96
 
21.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 243
54.5%
1 110
24.7%
2 93
 
20.9%
ValueCountFrequency (%)
3 245
54.9%
1 105
23.5%
2 96
 
21.5%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:47.902663image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length4849
Mean length27.02914826.921525
Min length1312

Characters and Unicode

 Dataset ADataset B
Total characters1205512007
Distinct characters5960
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowParkes, Mr. Francis "Frank"Panula, Master. Juha Niilo
2nd rowHold, Mr. StephenMadsen, Mr. Fridtjof Arne
3rd rowMoor, Mrs. (Beila)Stranden, Mr. Juho
4th rowColeff, Mr. SatioPeter, Mrs. Catherine (Catherine Rizk)
5th rowFlynn, Mr. JamesMannion, Miss. Margareth
ValueCountFrequency (%)
mr 265
 
14.5%
miss 90
 
4.9%
mrs 57
 
3.1%
william 33
 
1.8%
john 25
 
1.4%
master 23
 
1.3%
henry 22
 
1.2%
george 15
 
0.8%
james 15
 
0.8%
thomas 12
 
0.7%
Other values (886) 1265
69.4%
ValueCountFrequency (%)
mr 257
 
14.1%
miss 94
 
5.2%
mrs 68
 
3.7%
william 34
 
1.9%
john 24
 
1.3%
master 18
 
1.0%
george 13
 
0.7%
edward 12
 
0.7%
charles 12
 
0.7%
henry 11
 
0.6%
Other values (916) 1276
70.1%
2024-03-18T18:33:48.614165image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1377
 
11.4%
r 1009
 
8.4%
e 883
 
7.3%
a 799
 
6.6%
i 651
 
5.4%
n 641
 
5.3%
s 638
 
5.3%
M 560
 
4.6%
l 525
 
4.4%
o 523
 
4.3%
Other values (49) 4449
36.9%
ValueCountFrequency (%)
1375
 
11.5%
r 949
 
7.9%
e 855
 
7.1%
a 813
 
6.8%
i 662
 
5.5%
n 650
 
5.4%
s 641
 
5.3%
M 559
 
4.7%
l 539
 
4.5%
o 520
 
4.3%
Other values (50) 4444
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12055
100.0%
ValueCountFrequency (%)
(unknown) 12007
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1377
 
11.4%
r 1009
 
8.4%
e 883
 
7.3%
a 799
 
6.6%
i 651
 
5.4%
n 641
 
5.3%
s 638
 
5.3%
M 560
 
4.6%
l 525
 
4.4%
o 523
 
4.3%
Other values (49) 4449
36.9%
ValueCountFrequency (%)
1375
 
11.5%
r 949
 
7.9%
e 855
 
7.1%
a 813
 
6.8%
i 662
 
5.5%
n 650
 
5.4%
s 641
 
5.3%
M 559
 
4.7%
l 539
 
4.5%
o 520
 
4.3%
Other values (50) 4444
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12055
100.0%
ValueCountFrequency (%)
(unknown) 12007
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1377
 
11.4%
r 1009
 
8.4%
e 883
 
7.3%
a 799
 
6.6%
i 651
 
5.4%
n 641
 
5.3%
s 638
 
5.3%
M 560
 
4.6%
l 525
 
4.4%
o 523
 
4.3%
Other values (49) 4449
36.9%
ValueCountFrequency (%)
1375
 
11.5%
r 949
 
7.9%
e 855
 
7.1%
a 813
 
6.8%
i 662
 
5.5%
n 650
 
5.4%
s 641
 
5.3%
M 559
 
4.7%
l 539
 
4.5%
o 520
 
4.3%
Other values (50) 4444
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12055
100.0%
ValueCountFrequency (%)
(unknown) 12007
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1377
 
11.4%
r 1009
 
8.4%
e 883
 
7.3%
a 799
 
6.6%
i 651
 
5.4%
n 641
 
5.3%
s 638
 
5.3%
M 560
 
4.6%
l 525
 
4.4%
o 523
 
4.3%
Other values (49) 4449
36.9%
ValueCountFrequency (%)
1375
 
11.5%
r 949
 
7.9%
e 855
 
7.1%
a 813
 
6.8%
i 662
 
5.5%
n 650
 
5.4%
s 641
 
5.3%
M 559
 
4.7%
l 539
 
4.5%
o 520
 
4.3%
Other values (50) 4444
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
299 
female
147 
male
283 
female
163 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.65919284.7309417
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20782110
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowfemalemale
4th rowmalefemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 299
67.0%
female 147
33.0%
ValueCountFrequency (%)
male 283
63.5%
female 163
36.5%

Length

2024-03-18T18:33:48.797996image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-18T18:33:48.921221image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:49.022956image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
male 299
67.0%
female 147
33.0%
ValueCountFrequency (%)
male 283
63.5%
female 163
36.5%

Most occurring characters

ValueCountFrequency (%)
e 593
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 147
 
7.1%
ValueCountFrequency (%)
e 609
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 163
 
7.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2078
100.0%
ValueCountFrequency (%)
(unknown) 2110
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 593
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 147
 
7.1%
ValueCountFrequency (%)
e 609
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 163
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2078
100.0%
ValueCountFrequency (%)
(unknown) 2110
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 593
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 147
 
7.1%
ValueCountFrequency (%)
e 609
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 163
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2078
100.0%
ValueCountFrequency (%)
(unknown) 2110
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 593
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 147
 
7.1%
ValueCountFrequency (%)
e 609
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 163
 
7.7%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7376
Distinct (%)20.2%21.2%
Missing8487
Missing (%)18.8%19.5%
Infinite00
Infinite (%)0.0%0.0%
Mean29.60613328.957047
 Dataset ADataset B
Minimum0.420.75
Maximum8074
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:49.187713image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.75
5-th percentile44
Q12120.25
median2828
Q33836
95-th percentile55.9554
Maximum8074
Range79.5873.25
Interquartile range (IQR)1715.75

Descriptive statistics

 Dataset ADataset B
Standard deviation14.52485414.014288
Coefficient of variation (CV)0.490602870.48396813
Kurtosis0.159475250.18745704
Mean29.60613328.957047
Median Absolute Deviation (MAD)88
Skewness0.387575520.38215533
Sum10717.4210395.58
Variance210.97138196.40027
MonotonicityNot monotonicNot monotonic
2024-03-18T18:33:49.404665image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 20
 
4.5%
18 15
 
3.4%
28 14
 
3.1%
22 12
 
2.7%
21 12
 
2.7%
25 12
 
2.7%
30 12
 
2.7%
35 11
 
2.5%
27 10
 
2.2%
36 10
 
2.2%
Other values (63) 234
52.5%
(Missing) 84
 
18.8%
ValueCountFrequency (%)
24 18
 
4.0%
28 15
 
3.4%
22 14
 
3.1%
18 13
 
2.9%
21 13
 
2.9%
30 13
 
2.9%
29 13
 
2.9%
25 13
 
2.9%
19 11
 
2.5%
26 10
 
2.2%
Other values (66) 226
50.7%
(Missing) 87
 
19.5%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 2
 
0.4%
2 6
1.3%
3 3
0.7%
4 6
1.3%
5 2
 
0.4%
7 2
 
0.4%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 3
0.7%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 2
 
0.4%
2 6
1.3%
3 3
0.7%
4 6
1.3%
5 2
 
0.4%
7 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.538116590.52690583
 Dataset ADataset B
Minimum00
Maximum88
Zeros306293
Zeros (%)68.6%65.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:49.559553image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.1085490.97747339
Coefficient of variation (CV)2.06005361.8551197
Kurtosis15.48763112.147253
Mean0.538116590.52690583
Median Absolute Deviation (MAD)00
Skewness3.41294772.9624521
Sum240235
Variance1.22888090.95545422
MonotonicityNot monotonicNot monotonic
2024-03-18T18:33:49.686542image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 306
68.6%
1 95
 
21.3%
2 20
 
4.5%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 293
65.7%
1 113
 
25.3%
2 18
 
4.0%
4 9
 
2.0%
3 9
 
2.0%
5 3
 
0.7%
8 1
 
0.2%
ValueCountFrequency (%)
0 306
68.6%
1 95
 
21.3%
2 20
 
4.5%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 293
65.7%
1 113
 
25.3%
2 18
 
4.0%
3 9
 
2.0%
4 9
 
2.0%
5 3
 
0.7%
8 1
 
0.2%
ValueCountFrequency (%)
0 293
65.7%
1 113
 
25.3%
2 18
 
4.0%
3 9
 
2.0%
4 9
 
2.0%
5 3
 
0.7%
8 1
 
0.2%
ValueCountFrequency (%)
0 306
68.6%
1 95
 
21.3%
2 20
 
4.5%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.399103140.39237668
 Dataset ADataset B
Minimum00
Maximum56
Zeros337339
Zeros (%)75.6%76.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:49.804023image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.835934030.82425871
Coefficient of variation (CV)2.09453132.1006822
Kurtosis8.88103569.9949191
Mean0.399103140.39237668
Median Absolute Deviation (MAD)00
Skewness2.68700512.7416945
Sum178175
Variance0.698785710.67940243
MonotonicityNot monotonicNot monotonic
2024-03-18T18:33:49.922473image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 337
75.6%
1 59
 
13.2%
2 41
 
9.2%
5 4
 
0.9%
3 3
 
0.7%
4 2
 
0.4%
ValueCountFrequency (%)
0 339
76.0%
1 55
 
12.3%
2 45
 
10.1%
3 2
 
0.4%
5 2
 
0.4%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 59
 
13.2%
2 41
 
9.2%
3 3
 
0.7%
4 2
 
0.4%
5 4
 
0.9%
ValueCountFrequency (%)
0 339
76.0%
1 55
 
12.3%
2 45
 
10.1%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 339
76.0%
1 55
 
12.3%
2 45
 
10.1%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 59
 
13.2%
2 41
 
9.2%
3 3
 
0.7%
4 2
 
0.4%
5 4
 
0.9%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct379384
Distinct (%)85.0%86.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:50.538541image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.74439466.8475336
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters30083054
Distinct characters3132
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique331339 ?
Unique (%)74.2%76.0%

Sample

 Dataset ADataset B
1st row2398533101295
2nd row26707C 17369
3rd row392096STON/O 2. 3101288
4th row3492092668
5th row36485136866
ValueCountFrequency (%)
pc 31
 
5.5%
c.a 14
 
2.5%
a/5 10
 
1.8%
ston/o 7
 
1.2%
2 7
 
1.2%
ca 7
 
1.2%
w./c 6
 
1.1%
347082 5
 
0.9%
soton/oq 5
 
0.9%
soton/o.q 4
 
0.7%
Other values (396) 472
83.1%
ValueCountFrequency (%)
pc 27
 
4.7%
c.a 14
 
2.4%
a/5 10
 
1.7%
ston/o 8
 
1.4%
2 8
 
1.4%
w./c 6
 
1.0%
soton/o.q 6
 
1.0%
ca 6
 
1.0%
sc/paris 5
 
0.9%
soton/oq 4
 
0.7%
Other values (403) 480
83.6%
2024-03-18T18:33:51.234963image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 370
12.3%
1 338
11.2%
2 302
10.0%
7 254
8.4%
4 222
 
7.4%
6 218
 
7.2%
0 199
 
6.6%
5 186
 
6.2%
8 154
 
5.1%
9 154
 
5.1%
Other values (21) 611
20.3%
ValueCountFrequency (%)
3 364
11.9%
1 347
11.4%
2 292
9.6%
4 236
 
7.7%
7 234
 
7.7%
0 219
 
7.2%
6 208
 
6.8%
5 188
 
6.2%
9 167
 
5.5%
8 147
 
4.8%
Other values (22) 652
21.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3008
100.0%
ValueCountFrequency (%)
(unknown) 3054
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 370
12.3%
1 338
11.2%
2 302
10.0%
7 254
8.4%
4 222
 
7.4%
6 218
 
7.2%
0 199
 
6.6%
5 186
 
6.2%
8 154
 
5.1%
9 154
 
5.1%
Other values (21) 611
20.3%
ValueCountFrequency (%)
3 364
11.9%
1 347
11.4%
2 292
9.6%
4 236
 
7.7%
7 234
 
7.7%
0 219
 
7.2%
6 208
 
6.8%
5 188
 
6.2%
9 167
 
5.5%
8 147
 
4.8%
Other values (22) 652
21.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3008
100.0%
ValueCountFrequency (%)
(unknown) 3054
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 370
12.3%
1 338
11.2%
2 302
10.0%
7 254
8.4%
4 222
 
7.4%
6 218
 
7.2%
0 199
 
6.6%
5 186
 
6.2%
8 154
 
5.1%
9 154
 
5.1%
Other values (21) 611
20.3%
ValueCountFrequency (%)
3 364
11.9%
1 347
11.4%
2 292
9.6%
4 236
 
7.7%
7 234
 
7.7%
0 219
 
7.2%
6 208
 
6.8%
5 188
 
6.2%
9 167
 
5.5%
8 147
 
4.8%
Other values (22) 652
21.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3008
100.0%
ValueCountFrequency (%)
(unknown) 3054
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 370
12.3%
1 338
11.2%
2 302
10.0%
7 254
8.4%
4 222
 
7.4%
6 218
 
7.2%
0 199
 
6.6%
5 186
 
6.2%
8 154
 
5.1%
9 154
 
5.1%
Other values (21) 611
20.3%
ValueCountFrequency (%)
3 364
11.9%
1 347
11.4%
2 292
9.6%
4 236
 
7.7%
7 234
 
7.7%
0 219
 
7.2%
6 208
 
6.8%
5 188
 
6.2%
9 167
 
5.5%
8 147
 
4.8%
Other values (22) 652
21.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct185176
Distinct (%)41.5%39.5%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.38855432.95637
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros97
Zeros (%)2.0%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:51.450889image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1257.162525
Q17.89587.925
median13.9312514.45625
Q331.27530.375
95-th percentile118.31875120
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.379222.45

Descriptive statistics

 Dataset ADataset B
Standard deviation48.13763750.150937
Coefficient of variation (CV)1.48625461.5217373
Kurtosis27.71814524.872053
Mean32.38855432.95637
Median Absolute Deviation (MAD)6.702056.81665
Skewness4.28733954.1680823
Sum14445.29514698.541
Variance2317.23212515.1165
MonotonicityNot monotonicNot monotonic
2024-03-18T18:33:51.661259image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 22
 
4.9%
13 19
 
4.3%
7.8958 19
 
4.3%
10.5 16
 
3.6%
7.75 10
 
2.2%
26 10
 
2.2%
0 9
 
2.0%
26.55 8
 
1.8%
7.2292 8
 
1.8%
7.925 8
 
1.8%
Other values (175) 317
71.1%
ValueCountFrequency (%)
7.8958 24
 
5.4%
26 22
 
4.9%
8.05 20
 
4.5%
13 18
 
4.0%
7.75 15
 
3.4%
10.5 12
 
2.7%
7.925 12
 
2.7%
7.775 10
 
2.2%
8.6625 9
 
2.0%
0 7
 
1.6%
Other values (166) 297
66.6%
ValueCountFrequency (%)
0 9
2.0%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 9
2.0%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 3
 
0.7%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8888
Distinct (%)85.4%86.3%
Missing343344
Missing (%)76.9%77.1%
Memory size7.0 KiB7.0 KiB
2024-03-18T18:33:52.122665image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.54368933.8529412
Min length21

Characters and Unicode

 Dataset ADataset B
Total characters365393
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7677 ?
Unique (%)73.8%75.5%

Sample

 Dataset ADataset B
1st rowE121E10
2nd rowC65D48
3rd rowA10E8
4th rowB35E63
5th rowC22 C26E40
ValueCountFrequency (%)
c22 3
 
2.5%
f2 3
 
2.5%
f33 3
 
2.5%
c26 3
 
2.5%
c92 2
 
1.7%
e101 2
 
1.7%
b5 2
 
1.7%
c25 2
 
1.7%
c23 2
 
1.7%
c27 2
 
1.7%
Other values (89) 94
79.7%
ValueCountFrequency (%)
c22 3
 
2.4%
b96 3
 
2.4%
b98 3
 
2.4%
c23 3
 
2.4%
c25 3
 
2.4%
c27 3
 
2.4%
c26 3
 
2.4%
c2 2
 
1.6%
e8 2
 
1.6%
g6 2
 
1.6%
Other values (91) 99
78.6%
2024-03-18T18:33:52.733261image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 41
11.2%
2 38
10.4%
1 34
 
9.3%
3 32
 
8.8%
B 26
 
7.1%
6 24
 
6.6%
0 21
 
5.8%
5 20
 
5.5%
8 19
 
5.2%
4 16
 
4.4%
Other values (8) 94
25.8%
ValueCountFrequency (%)
C 40
10.2%
2 40
10.2%
1 36
 
9.2%
B 34
 
8.7%
6 32
 
8.1%
5 30
 
7.6%
24
 
6.1%
7 21
 
5.3%
3 20
 
5.1%
D 19
 
4.8%
Other values (9) 97
24.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 365
100.0%
ValueCountFrequency (%)
(unknown) 393
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 41
11.2%
2 38
10.4%
1 34
 
9.3%
3 32
 
8.8%
B 26
 
7.1%
6 24
 
6.6%
0 21
 
5.8%
5 20
 
5.5%
8 19
 
5.2%
4 16
 
4.4%
Other values (8) 94
25.8%
ValueCountFrequency (%)
C 40
10.2%
2 40
10.2%
1 36
 
9.2%
B 34
 
8.7%
6 32
 
8.1%
5 30
 
7.6%
24
 
6.1%
7 21
 
5.3%
3 20
 
5.1%
D 19
 
4.8%
Other values (9) 97
24.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 365
100.0%
ValueCountFrequency (%)
(unknown) 393
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 41
11.2%
2 38
10.4%
1 34
 
9.3%
3 32
 
8.8%
B 26
 
7.1%
6 24
 
6.6%
0 21
 
5.8%
5 20
 
5.5%
8 19
 
5.2%
4 16
 
4.4%
Other values (8) 94
25.8%
ValueCountFrequency (%)
C 40
10.2%
2 40
10.2%
1 36
 
9.2%
B 34
 
8.7%
6 32
 
8.1%
5 30
 
7.6%
24
 
6.1%
7 21
 
5.3%
3 20
 
5.1%
D 19
 
4.8%
Other values (9) 97
24.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 365
100.0%
ValueCountFrequency (%)
(unknown) 393
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 41
11.2%
2 38
10.4%
1 34
 
9.3%
3 32
 
8.8%
B 26
 
7.1%
6 24
 
6.6%
0 21
 
5.8%
5 20
 
5.5%
8 19
 
5.2%
4 16
 
4.4%
Other values (8) 94
25.8%
ValueCountFrequency (%)
C 40
10.2%
2 40
10.2%
1 36
 
9.2%
B 34
 
8.7%
6 32
 
8.1%
5 30
 
7.6%
24
 
6.1%
7 21
 
5.3%
3 20
 
5.1%
D 19
 
4.8%
Other values (9) 97
24.7%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
326 
C
82 
Q
37 
S
333 
C
76 
Q
37 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSC
5th rowQQ

Common Values

ValueCountFrequency (%)
S 326
73.1%
C 82
 
18.4%
Q 37
 
8.3%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 333
74.7%
C 76
 
17.0%
Q 37
 
8.3%

Length

2024-03-18T18:33:52.898386image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-18T18:33:53.015431image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:53.127602image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
s 326
73.3%
c 82
 
18.4%
q 37
 
8.3%
ValueCountFrequency (%)
s 333
74.7%
c 76
 
17.0%
q 37
 
8.3%

Most occurring characters

ValueCountFrequency (%)
S 326
73.3%
C 82
 
18.4%
Q 37
 
8.3%
ValueCountFrequency (%)
S 333
74.7%
C 76
 
17.0%
Q 37
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 326
73.3%
C 82
 
18.4%
Q 37
 
8.3%
ValueCountFrequency (%)
S 333
74.7%
C 76
 
17.0%
Q 37
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 326
73.3%
C 82
 
18.4%
Q 37
 
8.3%
ValueCountFrequency (%)
S 333
74.7%
C 76
 
17.0%
Q 37
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 326
73.3%
C 82
 
18.4%
Q 37
 
8.3%
ValueCountFrequency (%)
S 333
74.7%
C 76
 
17.0%
Q 37
 
8.3%

Interactions

Dataset A

2024-03-18T18:33:42.153144image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.462145image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:39.468161image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:43.416999image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.110458image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:43.917656image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.768473image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.390125image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.524884image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.986988image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:42.271912image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.552007image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:39.588966image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:43.520269image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.238583image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.006429image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.893913image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.588537image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.640491image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.075620image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:42.406893image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.654396image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:39.725542image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:43.637468image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.380795image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.105492image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.123019image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.687937image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.775653image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.172228image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:42.546050image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.758164image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:39.863946image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:43.737244image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.510308image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.197348image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.264978image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.793191image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.911651image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.275431image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:42.668177image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.851076image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:39.986822image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:43.829488image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:40.638159image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.296461image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:41.394467image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:44.891835image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-18T18:33:42.032673image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-18T18:33:45.370748image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Missing values

Dataset A

2024-03-18T18:33:42.851641image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-03-18T18:33:45.983602image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-03-18T18:33:43.116091image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-03-18T18:33:46.173797image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
27727802Parkes, Mr. Francis "Frank"maleNaN002398530.0000NaNS
23623702Hold, Mr. Stephenmale44.0102670726.0000NaNS
82382413Moor, Mrs. (Beila)female27.00139209612.4750E121S
51451503Coleff, Mr. Satiomale24.0003492097.4958NaNS
42842903Flynn, Mr. JamesmaleNaN003648517.7500NaNQ
77477512Hocking, Mrs. Elizabeth (Eliza Needs)female54.0132910523.0000NaNS
30730811Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)female17.010PC 17758108.9000C65C
62462503Bowen, Mr. David John "Dai"male21.0005463616.1000NaNS
58358401Ross, Mr. John Hugomale36.0001304940.1250A10C
35535603Vanden Steen, Mr. Leo Petermale28.0003457839.5000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
505103Panula, Master. Juha Niilomale7.041310129539.6875NaNS
12712813Madsen, Mr. Fridtjof Arnemale24.000C 173697.1417NaNS
74474513Stranden, Mr. Juhomale31.000STON/O 2. 31012887.9250NaNS
53353413Peter, Mrs. Catherine (Catherine Rizk)femaleNaN02266822.3583NaNC
72772813Mannion, Miss. MargarethfemaleNaN00368667.7375NaNQ
27427513Healy, Miss. Hanora "Nora"femaleNaN003703757.7500NaNQ
69469501Weir, Col. Johnmale60.00011380026.5500NaNS
47147203Cacic, Mr. Lukamale38.0003150898.6625NaNS
42943013Pickard, Mr. Berk (Berk Trembisky)male32.000SOTON/O.Q. 3920788.0500E10S
72973003Ilmakangas, Miss. Pieta Sofiafemale25.010STON/O2. 31012717.9250NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
838401Carrau, Mr. Francisco Mmale28.00011305947.1000NaNS
84784803Markoff, Mr. Marinmale35.0003492137.8958NaNC
19119202Carbines, Mr. Williammale19.0002842413.0000NaNS
74374403McNamee, Mr. Nealmale24.01037656616.1000NaNS
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
929301Chaffee, Mr. Herbert Fullermale46.010W.E.P. 573461.1750E31S
21321402Givard, Mr. Hans Kristensenmale30.00025064613.0000NaNS
30430503Williams, Mr. Howard Hugh "Harry"maleNaN00A/5 24668.0500NaNS
82983011Stone, Mrs. George Nelson (Martha Evelyn)female62.00011357280.0000B28NaN
18518601Rood, Mr. Hugh RoscoemaleNaN0011376750.0000A32S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
23223302Sjostedt, Mr. Ernst Adolfmale59.00023744213.5000NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C
61661703Danbom, Mr. Ernst Gilbertmale34.01134708014.4000NaNS
48048103Goodwin, Master. Harold Victormale9.052CA 214446.9000NaNS
23423502Leyson, Mr. Robert William Normanmale24.000C.A. 2956610.5000NaNS
939403Dean, Mr. Bertram Frankmale26.012C.A. 231520.5750NaNS
83083113Yasbeck, Mrs. Antoni (Selini Alexander)female15.010265914.4542NaNC
71671711Endres, Miss. Caroline Louisefemale38.000PC 17757227.5250C45C
58558611Taussig, Miss. Ruthfemale18.00211041379.6500E68S
77177203Jensen, Mr. Niels Pedermale48.0003500477.8542NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.